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^ Abstract 

^ In this study, we develop a new theory of estimating Hurst parame- 

ter using conic multivariate adaptive regression sphnes (CMARS) method. 
Wc concentrate on the strong solution of stochastic differentional equations 
^ (SDEs) driven by fractional Brownian motion (fBm). The superiority of our 

^ approach to the others is, it not only estimates the Hurst parameter but also 

^ finds spline parameters of the stochastic process in an adaptive way. We 

C examine the performance of our estimations using simulated test data. 
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^ 1. Introduction 

O 

CO Fractional Brownian motion (fBm) is a widely used concept for modelling 

various situations such as the level of water in a river, the temperature at 
a specific place, empirical volatility of a stock, the price dynamics of elec- 

^ tricity. It appears naturally in these phenomena because of its capability 

^ of explaining the dependence structure in real-life observations. This struc- 

ture in fBm is represented by its Hurst parameter H. A fBm with Hurst 
parameter if > 1/2 is called a persistent process, i.e., the increments of this 



* Corresponding author 
Em,ail addresses: fatmayerlikaya@gmail.com (F.Yerlikaya-Ozkurt), 
cvardar(Setu . edu . tr (C.Vardar-Acar), yyolcuSmetu. edu.tr (Y.Yolcu-Okur), 
gweber@metu.edu.tr (G.-W. Weber) 



Preprint submitted to JCAM 



February 15, 2013 



process are positively correlated. On the other hand, the increments of a 
fBm with H < 1/2 is called an anti-persistent process with increments being 
negatively correlated. For H = 1/2, fBm corresponds to Brownian motion 
which has independent increments. For further information on fBm and its 
applications, see [21 HSl [H] . 

It is highly important to identify the value of Hurst parameter in or- 
der to understand the structure of the process and its applications since 
the calculations dramatically differ according to the value of H. Therefore, 
some techniques have been developed to estimate Hurst parameter which 
can be categorized into three groups; heuristics, maximum likelihood and 
wavelet-based estimators. In the group of heuristics estimators, there is R/S 
estimator which was firstly proposed by Hurst f8], followed by the methods 
of correlogram, variogram, variance plot, and partial correlations plot. Due 
to lack of accuracy of heuristics estimators, maximum likelihood estimators 
(mle) were developed. Being weakly consistent is the main disadvantage of 
mle. In paralel to mle, wavelet-based estimators were suggested because of 
the popularity of wavelet decomposition of fBm [3l HE] . 

In search of faster and efficient ways to estimate the Hurst parameter 
H, we suggest a new numerical and computational way, conic multivariate 
adaptive regression splines (CMARS). CMARS is an alternative approach 
to the well-known data mining tool multivariate adaptive regression splines 
(MARS). It is based on a penalized residual sum of squares (PRSS) for MARS 
as a Tikhonov regularization (TR) problem. CMARS treats this problem by 
a continuous optimization technique, in particular, the framework of conic 
quadratic programming ( CQP). These convex optimization problems are very 
well-structured, herewith resembling linear programs and, hence, permitting 
the use of interior point methods. 

This paper is organized as follows; in Section 2, we start with explaning 
the properties of our madel given as, SDKs driven by fBm. In Section 3, we 
introduce the method CMARS relating it to the Hurst parameter estimation 
of our model. In Section 4, we give an application of our study, in order to 
test the theory we have developed. Finally, we present a brief conclusion and 
a general outlook of our study. 
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2. Stochastic Differential Equations with Fractional Brownian Mo- 
tion 

Stochastic Differential Equations (SDEs) generated by fBm are widely 

used to represent noisy and real-world problems. They play an important 
role in many fields of science such as finance, physics, biotechnology and 
engineering. In this section, we briefly recall some concepts on fBm and 
stochastic differential equations driven by fBm. 

2.1. Fractional Brownian Motion 

Let if be a constant in the interval (0, 1). FBm {W^ {t))t>o with Hurst 
parameter if, is a continuous and centered Gaussian process with covariance 
function 

E[W''{t)W''{s)] = hf" + -\t- 

We note that, for ii = 1/2, fBm corresponds to a standard Brownian motion 
which has independent increments. For a standard fBm, W^{t): 

• ly^(O) = and E[W^{t)] = for all t > 0. 

• has homogenous increments, i.e., W^{t + s) — W^{s) has the same 
law as W^{t), for all s,t>0. 

• is a Gaussian process and E[{W^{t)f] = {t > 0), for all 
iie (0,1). 

• has continuous trajectories. 

The Hurst parameter H of fBm explains the dependency of data. Indeed, 
the correlation between increments for s,t >0 can be obtained by; 

mw{t +h)- w{t)){W{s +h)- w{s))] = 

^[{n + 1)2^ + {n- 1)2^ - 2n2^]. 

It can be seen that observations with H > 1/2 have positively correlated 
increments and display long-range dependence, while the observations with 
H < 1/2 have a negatively correlated increments and display a short-range 
dependence structure (see Figure 1). Therefore, it is crucial to find the 
Hurst parameter of a stochastic process for understanding the structural 
behaviour of this phenomena. In this study, we concentrate on finding H for 
the stochastic processes which arc the strong solutions of SDEs with fBm. 
Hence, we first recall some fundamental properties of them. 
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Figure 1: Sample paths of fBm with different Hurst parameter values simulated by 
Cholesky method. 



2.2. Stochastic Differential Equations Driven by Fractional Brownian Motion 
Suppose we have a stochastic process X = {X(t); t > 0} defined on a 
fihered probabihty space J-", {J^t)t>o, P) which is the strong solution of the 
following SDE: 

dX{t) = a{t, X{t))dt + X{t))dW" {t). (1) 

Here, a(t, X(t)) and b{t,X{t)) are the drift and diffusion terms satisfying 
the conditions of existence and uniqueness theorem for t > 0. Note that 
it is necessary to have the integrator as a semi- martingale in the theory of 
stochastic integration. However, since fBm is not a semi-martingale, one 
should extend the usual settings as in Ito integral and define the integration 
with respect to fBm in a new pathwise integration technique. Alos et al. pQ 
construct the theory of integration with respect to general Gaussian proceses 
to overcome this. For further studies on this topic, see [131 E]- 

There have been comprehensive studies on statistical inferences for pro- 
cesses satisfying SDEs driven by Brownian motion. However, the recent in- 
terest is on SDEs driven by fBm since there have not been adequate studies 
on this topic. The purpose of this study is to estimate the Hurst parameter 
of the following SDE 

dX{t) = a{t, X{t))dt + b dW^{t), Xq G M, (2) 

by Conic Multivariate Adaptive Regression Splines ( CMARS) methodology. 
Note that b(t,X(t)) = b term in equation is taken as constant. 
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3. Estimation of Hurst Parameter Using Conic Multivariate Adap- 
tive Regression Splines Method 

In this section, as an alternative to the existing methods of estimation 
of Hurst parameter, CMARS and the related proposed methodology will be 
introduced. For that purpose, firstly, we give a brief description of CMARS 
method and then we mention about the methodology and show how to apply 
this technique for finding the Hurst parameter of SDE defined in equation 

3.1. Method of Conic Multivariate Adaptive Regression Splines 

CMARS method is an alternative approach to the well-known data min- 
ing tool Multivariate Adaptive Regression Splines (MARS). It makes no spe- 
cific assumption about the underlying functional relationship between the 
dependent and independent variables to estimate a general model function 
[5]. CMARS is introduced by linear combinations of the basis functions 
(BFs) that are used in MARS. The selection of BFs is data-based and spe- 
cific to the problem at hand. CMARS uses one-dimensional BFs of the form 
c"*"(x, r) = [+{x — r)]_|_ and c~(a;, r) = [— (x — t)] + , where [g]+ := max {0, q} 
(see [201 [22] for further details). Each function is piecewise linear, with a 
knot at the value r, and the corresponding couple of function is called a 
reflected pair. A set of BFs is given as follows: 

P ■= {{Xj ~^) + ,{^ -Xj)+ I T e {Xij, X2,j,. XN,j} , j e {l,2,...,p}}. 



}' 




Figure 2: Visualization of the BFs. 
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A CMARS model function / is represented by a linear combination of 
BFs which is successively built up by the set p as described below: 



M 



Y = fix) + e = eo + Y, Om^mixn + e- (3) 



m=l 



Here F is a response variable, x"^ = {xi,X2, ...,Xp)'^ a vector of predictors 
for the corresponding mth multivariate basis function. Furthermore, 6m are 
the unknown coefficients for the mth basis function (m = 1,2,..., M) or for 
the constant 1 (m = 0), and e is an additive stochastic component which is 
assumed to have zero mean and finite variance. In equation (|3]), ipm (fn = 
1,2, ...,M) are BFs as products of two or more one-dimentional BFs. Such 
interaction BFs are created by multiplying an existing basis function with a 
truncated linear function, involving a new variable. The form of the mth BF 
can be written as follows: 

Km 

il^mix"") ■■= Ylls^Y ■ (x^j^ - r^™)] + . (4) 
i=i 

Here, x"^ is the vector of variable contributed to the mth BF, is the 
number of truncated linear functions multiplied in the mth BF, x^™ is the 
predictor variable corresponding to the jth truncated linear function in the 
mth BF, r^™ is the knot value corresponding to the variable x^™, and s^"™ is 
the selected sign +1 or —1. 

CMARS is constructed by a Penalized Residual Sum of Squares (PRSS) 
parameter estimation problem, instead of an ordinary least-squares estima- 
tion problem as it occurs in MARS method. The PRSS problem aims at ac- 
curacy and a smallest possible complexity of the model. PRSS with penalty 
parameters and with Mmax BFs which are accumulated in the first part 
of the MARS algorithm, has the following form: 



N Mmax 



1=1 m=l |a|=l r<s V 

(5) 

where Vm '■= {/^J^lj = 1, 2, -ft'm} is the variable set associated with the 
mth BF, t"^ = (^m-^ , , truj^ j represents the vector of variables which 
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contribute to the mth BF. Moreover, D^^^tpmif^) := g^^r^^^i^'^) « = 
(ai, 02)"^, |q:| := «! + 0:2, where ai, ^2 G {0, 1}. 

After using the same penahy parameter A = A™, for each derivative, PRSS 
turns into a Tikhonov regularization problem as described below: 



PRSS 



y-il^{d)e \\\\Le\\l, (6) 



where L is constructed by the discretizations of the high-dimentional inte- 
grals given in equation (|5]). The model approximations as presented in equa- 
tion (|6| are carefully prepared. They play an important role in order to raise 
a final model approximation which is linear in the unknown spline parame- 
ters. After unifying some discretised complexity terms and including them 
into inequality constraints, a Conic Quadratic Programming (CQP) problem 
is obtained which uses interior point methods 116] . The formulation of 
CQP is given as follows: 



min t, 

t,e 

subject to 



y-ijj{d)e <t, ||X6/||2<VM, (7) 



referring to some chosen complexity bound M > 0. 

3.2. Discretization of Stochastic Differential Equations with Fractional Brow- 
nian Motion 

In general, the distribution of the stochastic process {X(t); t > 0} is not 
known. Therefore, the discretized version of the SDEs, (Xj)jgN, should be 
simulated [TD]. There are many discretization schemes for the SDEs gener- 
ated by fBm such as Euler and Milstein Scheme (see [6] for further details). 
In this study, Euler approximation is used since Milstein approximation con- 
tains the derivatives of the diffusion term in equation (|2| which is equal to 
zero. The Euler approximation of the equation ^ is: 



= X, + a(X„t,)(t,+i - t,) + b{X,,t,){Wil, - Wl") (z G N). (8) 

For finitely many given data points {i = 1,2, .. . , A^), the symbolic 

form of the approximation can be given as follows: 
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- - - - ^w" 
X, = a{Xi,U) + h{X,,U)^^, (9) 

where AWl^ = — W^^ is a centered Gaussian random variable, hi = 

ii+i — ii := Aii represents step lengths and 



^'+r^' if i = l,2,...,N -1, 



X, 



if i = N, 



represents the difference quotients raised on the ith data value. A more 
compact form of the equation (|9|) is defined by 



Xi = a, + Fid, (10) 

where Gi := a{Xi,ii), Fi := b{Xi,ii), and q := AW/^ /hi. Note that equation 
(10) can be considered as an approximation of the problem. The expressions 
stated until the end of Section 3 are described parametrically with respect to 
the Hurst parameter H. In Section 4, we shall specify it by numeric values. 

3. 3. Parameter Estimation 



To determine the unknown values in equation (10), the following min- 
imization problem is constructed using some abbreviated notation of the 
approximation [T9] : 



min - (Gi + Fid) ■ 

e ^ — ' II ^ '2 

i=l 

Here, 6 comprises all unknown parameters in the Euler approximation. To 
solve this optimization problem and to give a smoother, regularized approx- 
imation to the data, we employ CMARS method which controls any high 
"variation" in the data. CMARS' BFs are gradually constructed for the ap- 
proximation of Gi and Fj with data 0",^^, = {Xi,ti) according to the 
following approaches ^T] : 

Gi = ao + ^aiBi{ Ui^^), and Fid = /3o + ^(3mCm{ U7,c)- 

1=1 m=l 
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Here, the forms of the BFs are BiiU 



n 



k=l 



CmiU 



c, 



n 



[X 



c 



and 



r^)]+ as we described in equation (4). Here, 



we choose the numbers and (in sense of equation Q) as maximal, 
namely, as 2. 

We construct the penalized residual sum of squares (PRSS) for our min- 
imization problem in the following form: 



N 



PRSS 



J](X,-(G', + F,q))2 + 

i=l 

dB 2 

1 = 1 |a| = l r<s 

2 



m=l |a|=l '■<« 



Here, the multipliers A;, /i^ > are smoothing parameters and they pro- 
vide a tradeoff between both accuracy and complexity. To approximate two 
multi-dimensional integrals in equation (11), parallepipes Qf = [a[ ^, b[ x 



H,bAb] = m=iQi,B and Qg = [a^^c^Kc] x fe,6^c] = ULiQ^c 
which encompass all our input data are constructed. Then, the following 
discretization is applied for the first multi-dimensional integral: 



a, 



E 

1=1 



( 

2 

E E 

|a|=l 

\a.={ai,a2) 



a, 



|a|=l r<s 



(7V+1)2 



1=1 



Li ai 



\ 



The same discretization is also applied for the second multi-dimensional in- 



tegral in equation (11). For simphcity, we introduce PRSS in the following 
matrix notation: 



A 
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PRSS 



X- A6 



+ llLfai 



1=1 



where X 



T 



(13) 



a = (ao,ai,a2, •••,adfl) and f3 = (/3o, /32, , Gi + Fid = AiO, 



— i^^lh^ib ■■■yLfN+l)^l) — (-^fm; -^2m5 •••5 -^(Ar+l)2mj ^^^^ 

1, 2, ...,d^ and for m = 1, 2, d"-^, respectively [2T] . 

Using uniform penalization by taking the same A for each derivative term, 
the regularized approximation problem of PRSS turns into a Tikhonov regu- 
larization (TR) problem: 



PRSS 



X- AO 



A LO 



(14) 



Here, A = Ai 



A 



dB 



Hdc, and L is an {Mmax + 1) x {Mmax + 



l)-diagonal matrix with first column Lq = 



and the other columns 



being the vectors £f , introduced above, where Mmax = + . 

As we just mentioned in Subsection 3.1, TR problem can be solved by a 
CQP program as given in equation ([T]). In order to write the optimality con- 
dition for this problem, we firstly reformulate our program as the subsequent 
primal problem: 



min t, 

t,e 



such that X '■- 







N 







1 ol 

L 



0^ 





" t ' 




" -X " 
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+ 








' t ' 










+ 






6 




Vm 



(15) 



Here, L^"*"^, are the (A^ + 1)- and (Mma^. + 2)-dimensional ice-cream 

(or second-order) cones [15J. The dual problem to the latter problem is given 
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by 



0?; 



max (X',0)wi+ (Ol,_+i, 
















1 















(16) 



A primal-dual optimal solution {t, 6, x, Vi '^ii'^2) is obtained when the opti- 



mality conditions given in equation (17) are satisfied: 







N 
T 



On a 




' t ' 




' -X " 




1 0^ 




e 


+ 









OMmax + l L 






" t ' 




0R4max + l 










+ 






^Mmax + l . 














0^ 



0^ 








1 




U}2 = 




Ou^ax + l . 




_ Om^^^+1 



'^iX = 0, ^7 = 0, 



(17) 



4. Application and Results 

In order to test the theory developed in the previous section, we start with 
simulating stochastic process for a fixed Hurst parameter H using Cholesky 
method [3]. Now, our aim is to estimate the exact value of this Hurst parame- 
ter of the simulated data. We generate various stochastic processes which are 
the strong solution of SDEs driven by fBm with different Hurst parameters. 
Next, we construct CMARS model for each generated process to find the best 
fit. For the implementation of CMARS algorithm, BFs are built using Sal- 
ford MARS® software program [TTj as in fT2l[22]. The optimization problem 
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given in equation (15) is solved by using interior point methods (IPMs) via 
the optimization software MOSEK [lll|16]. Finally, we examine the perfor- 
mances of CMARS fits according to well-known performance measures such 
as mean absolute error (MAE), mean squared error (MSE), correlation coef- 
ficient (r), multiple coefficient of determination (-R^), adjusted B? (Adj-i?^), 
and proportion of residuals within three sigma (PWI). The steps described 
above are applied for H=0.2, H=0.3, H=0.7, and H=0.8. The results of the 
applications are summarized in Table 1. 



Table 1: CMARS performances for fBm generated by H=0.3, i7=0.7, and H=0.8. 





Performance IVIeasures 


Hurst index 


MAE 


MSE 


r 




Adj-R-^ 


PWI 


H = 0.1 


0,8766 


1,4827 


0,1923 


0,0370 


-0,1651 


1 


H = 0.2 


0,6207* 


0,7480* 


0,9868* 


0,9739* 


0,9684* 


1 


H = 0.3 


0,8861 


1,5266 


0,0991 


0,0098 


-0,1979 


1 


H = 0.4 


0,8770 


1,4733 


0,2075 


0,0430 


-0,1577 


1 


H = 0.5 


0,8839 


1,5201 


0,1281 


0,0164 


-0,1900 


1 


H = 0.1 


0,7138 


0,9776 


0,2607 


0,0679 


-0,1276 


1 


H = 0.2 


0,7162 


0,9901 


0,2400 


0,0576 


-0,1401 


1 


H = 0.3 


0,3606* 


0,2516* 


0,984* 


0,9699* 


0,9636* 


1 


H = 0.4 


0,6926 


0,9387 


0,3249 


0,1055 


-0,0821 


1 


H = 0.5 


0,7053 


0,9763 


0,2641 


0,0697 


-0,1254 


1 


H = 0.5 


0,7031 


0,9719 


0,4743 


0,2250 


0,0623 


1 


H = 0.6 


0,7048 


0,9784 


0,4688 


0,2198 


0,0560 


0,9898 


H = 0.7 


0,3634* 


0,2602* 


0,9582* 


0,9182* 


0,9010* 


1 


H = 0.8 


0,7041 


0,9506 


0,4919 


0,2419 


0,0828 


1 


H = 0.9 


0,7081 


0,9781 


0,4691 


0,2200 


0,0563 


1 


H = 0.5 


0,6068 


0,7841 


0,5914 


0,3498 


0,2133 


1 


H = 0.6 


0,6359 


0,8015 


0,5783 


0,3345 


0,1948 


1 


H = 0.7 


0,6053 


0,7389 


0,6176 


0,3815 


0,2517 


1 


H = 0.8 


0,2009* 


0,0822* 


0,9883* 


0,9768* 


0,9720* 


1 


H = 0.9 


0,6006 


0,7294 


0,6240 


0,3894 


0,2613 


1 



* indicates better performance 



In the case for anti-persistent processes, namely, H = 0.2, H = 0.3, the 
values of MAE and MSE are lower and the values of i?^, Adj-i?^, PWI and 
r are higher than the values for the other Hurst parameter values. Similar 
results are also obtained for the case of persistent processes. Hence, this 
shows that according to performance measures criteria, the best CMARS fit 
gives us the correct Hurst parameter value. 
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5. Conclusion and Outlook 

Recent developments in computer science provide environments in order 
to collect numerous data from various sources. Data mining methods enable 
us to analyze data for different purposes in many fields, such as finance, en- 
vironment, and energy. One of the modern method of data mining, CMARS, 
has been developed as an alternative to the backward stepwise part of the 
MARS algorithm (see [22]). 

This paper gave a new contribution to Hurst parameter estimation theory 
for the strong solution of SDEs driven by fBm using CMARS technique. The 
main superiority of our approach to the others is that it not only estimates 
the Hurst parameter but it also finds spline parameters of the stochastic 
process. What is more, our representation of financial and other processes 
is empowered by all the modeling and numerical advantages of CMARS. 
By this, a bridge has been offered between convex optimization and Hurst 
parameter estimation theory. 

In this pioneering paper, we followed a two-level approach with the deter- 
mination of the parameters at the lower level, except of the Hurst-parameter 
which was chosen at the following upper level. This approach can be re- 
garded as a parametric optimization (cf. [H])- In future research, we will 
deepen and extend this approach by both more model-free strategies (e.g., 
from statistics and data mining), especially, more model-based ones, and with 
a comparison of them. The model-based approaches will be of a more in- 
tegrated mathematical nature and in the analytical line that we initiated in 
this work. Through these investigations we intend to further contribute to 
a deeper understanding of our modern financial markets and to offer helpful 
mathematical decision tools for them. 
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